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Abstract 

We describe a linearizable, wait-free implementation of a one-bit 
swap object from a single max register and an unbounded array of test- 
and-set bits. Each swap operation takes at most three steps. Using 
standard randomized constructions, the max register and test-and-set 
bits can be replaced by read- write registers, at the price of raising the 
cost of a swap operation to an expected 0(max(logn, min(logt, n))) 
steps, where t is the number of times the swap object has previously 
changed its value and n is the number of processes. 

1 Introduction 

A swap object supports a single read-modify- write operation swap that re- 
turns the old contents of the object while setting a new value. The simplest 
variant of a swap object is one that stores only a single bit. This variant is 
equivalent to a test-and-set object that has been extended with a test-and- 
reset operation, where each operation returns the old value of the object 
and writes a new value (1 for test-and-set and for test-and-reset), all as 
an atomic operation. 

General implementations of swap objects can be very expensive, even 
given test-and-set bits. The best known general swap object implementation 
is that of Afek, Weisberger, and Weisman |AWW93] . which may require as 
many as 0(nlogn) steps to carry out a single swap operation even in the 
one-shot case. Whether this cost can be reduced is an interesting open 
question. 
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We do not answer this question, but instead observe that the cost can 
be greatly reduced if the size of the swap object is restricted to a single bit. 
We give a simple implementation of a swap object from a single max regis- 
ter |AACH12] that indexes an unbounded array of test-and-set bits. The key 
observation is that swap operations on a one-bit register can be linearized 
by first by separating out groups of swap operations that all have the same 
input or 1 (using the max register), and then choosing a single operation 
from each group to linearize first (using a test-and-set). Because the swap 
object is limited to one bit, knowing whether an operation is linearized first 
within its group is enough to determine its return value: it will be equal 
to the common input of the group if it is not linearized first and equal to 
the other input if it is. No further ordering of operations within a group is 
needed. 

It is known |AACH12] that unbounded max registers can be implemented 
directly from read-write registers, at a cost of 0(min(log f , n)) steps for any 
operation that leaves a max register with value v. Test-and-set bits can 
also be implemented from read-write registers if randomization is permit- 
ted; the costs of the best current implementations are an expected O(logn) 
register operations for each test-and-set operation assuming an adaptive 
adversary that can react to what the implementation does |AGTV92] and 
0(log* n) expected operations assuming an oblivious adversary that can- 
not [GW12j . Applying these construction to our algorithm gives a cost 
of either 0(max(logn,min(logt, n))) or 0(max(log* n, min(log n))) regis- 
ter operations on average for each swap operation, where t is the number 
of times the swap object switches between its two values in the linearized 
schedule. For typical values of t, we would expect the O(logt) term to 
dominate. 

2 Model 

We assume a standard asynchronous shared-memory model, with concur- 
rency modeled by interleaving under the control of an adversary sched- 
uler. We are interested in implementations of objects that are wait-free 
(every process finishes in a finite number of steps in any execution) and 
linearizable |HW9nj (th ere exists a sequential execution of the object that 
is consistent with the observed execution order). 

Our base objects consist of a max register and an array of test-and- 
set bits. A max register |AACH12] supports write and read operations, 
where a read operation returns the largest value previously written. A test- 
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and-set bit supports a single operation TAS, which sets the bit to 1 and 
returns the previous value. Unless otherwise specified, we assume that both 
the max register and the test-and-set bits are initialized to 0. As discussed 
previously, we can also use standard techniques to replace these base objects 
with ordinary registers. 

3 Implementation 

Pseudocode for the swap operation is given in Algorithm [TJ The imple- 
mentation uses a single max register maxRound, and an unbounded array of 
test-and-set bits t[0 . . . ]. To initialize the swap object to b, set maxRound to 
b and initialize t[b] to 1 (as if a TAS operation had already successfully been 
performed on it); this is equivalent to running swap(6) with maxRound and 
all test-and-set objects initialized to and discarding the result. 



1 procedure swap(f ) 
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^ maxRound 
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if r ^ V (mod 2) then 
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r r + 1 
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maxRound ^ r 
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if TAS(t[r]) = then 
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return -tv 
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else 
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return v 



Algorithm 1: Pseudocode for a swap operation 



The step complexity of this implementation is 0(1). Indeed, each exe- 
cution of swap requires either two or three operations on the base objects 
depending on the outcome of the test in Line O 

Both max registers and test-and-set bits can be implemented from regis- 
ters. If the max register r is implemented from registers using the technique 
of I A AC HI 2] . the cost becomes 0(logf,n), where v is the value in the max 
register. It is easy to see that v is bounded by the number of swap oper- 
ations, since each swap operation increments it at most once. Test-and-set 
bits can also be implemented directly from registers using randomization. 
Using the best currently-known implementations, the cost is an expected 
O(logn) steps per test-and-set operation |AGTV92] assuming an adaptive 
adversary and 0(log* n) |GW12j assuming an oblivious adversary. In either 
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case the cost of the test-and-set wih be dominated by the cost of the max 
register after a hnear number of swap operations in the worst case. 

4 Linearizability 

To show linearizabihty, we construct an exphcit hnearization order based on 
the final value of r for each swap operation, with processes sharing the same 
value ordered further by the linearization order of the test-and-set bit t[r]. 

Theorem 1. Algorithm{l\is a linearizahle implementation of a swap object. 

Proof. Fix an execution of the protocol. 

For each swap operation cr, define r(cj) to be the value of the internal 
variable r at the time of the call to TAS(t[r]) in Line E] of the execution of 
a. Note that r{a) mod 2 is always equal to the input value Va of a. Let Si 
be the set of all swap operations a for which r(cr) = i. We will construct 
a linearized execution by ordering the sets Si by increasing i, and ordering 
operations within each Si based on the linearization order for t[i]. 

To show that this is in fact a linearization, we must show both that it 
respects the observable order of operations and that the resulting execution 
corresponds to a sequential execution of a swap object. 

For the first part, suppose that some operation o"i finishes before another 
operation o"2 starts. First let us show that r{ai) < r(a"2). The value r(cji) is 
either read from maxRound or written to it before ai finishes; the subsequent 
read of maxRound by a2 thus returns a value r' > r{ai), and r(cj2) is either 
r' or r' -|- 1, which in either case is greater than or equal to r{ai). If r(cji) < 
r((T2), then the two operations are in distinct sets S^^ai) a-iid '^^((jj), and ai 
is linearized first. If instead r(cri) = r((T2), then both are in the same set Si. 
Now because cJi accesses t[i] before £72, it again holds that ai is linearized 
first. 

For the second part, we start by showing that there are no gaps in 
the sequence of sets Si. Specifically, we observe that if Si is nonempty for 
i > 6+1, where h is the initial value of maxRound, then so is Si-i. The reason 
is that if Si is nonempty, then either some operation reads i from maxRound 
or writes i to maxRound. In either case, because i is not the initial value 
of maxRound, there is a first operation a that writes i to maxRound. This 
operation must previously have read i — \ from maxRound. Since f > 6 + 1, 
i — 1 > 6, and so i — 1 can only appear in maxRound if some other operation 
a' writes it. But then a' € Si-i and Si^i is nonempty as claimed. 
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Now consider some specific operation a and let i = r(a). Recall that 
i mod 2 = Va, where v^- is the input to a. There are two cases, depending 
on the value returned by TAS(t[r]) in a: 

• If this value is 0, then we have that (a) a is linearized first among 
all operation in Si, and (b) a returns ^Va- = [i — 1) mod 2. If Si-i 
is nonempty, then there exists a swap(-ifCT) operation in Si-i that 
linearizes immediately before a, and thus it is correct for a to return 
-ifo-. If Si-i is empty, then i — 1 < b. It cannot be the case that 
i = b, because t[b] is initialized to 1, contradicting the assumption that 
TAS(t[i]) returns 0. Nor can we have i < b. It follows that i — 1 = b, 
and a correctly returns the initial value b. 

• If this value is 1, then either (a) a is not linearized as the first operation 
in Si, or (b) a is linearized as the first operation in Si and i = b. In the 
first case, a returns the input to the previous operation in Si; in the 
second, it returns the initial value b. In both cases this return value is 
correct. 

□ 

5 Conclusion 

We've shown that it is possible to build a very efficient swap object from test- 
and-set bits and max registers, if the swap object is limited to two values. 
The key idea is that we can alternate sequences of swap(O) and swap(l) 
operations so that the operations within each sequence can be linearized 
with a single test-and-set bit. Because there are only two possible values, 
the return value of each swap operation can be computed directly from the 
result of the test-and-set operation: either it is linearized after another swap 
with the same input, or it is linearized after another swap with a different 
input. Unfortunately, there does not seem to be any direct way to expand 
this trick to handle more than two inputs. 

Prom the work of Afek, Weisberger, and Weisman |AWW93] . we know 
that a general swap object can be implemented directly from test-and-set 
bits and read-write registers, but the cost per swap operation is superlinear 
in the number of processes. This leaves a huge complexity gap between the 
two-valued case and the general case. A natural next step might be to look 
at less restricted cases such as three-valued swap. This object is general 
enough to break the specific technique used here for two-valued swap, but 
may still allow for a highly efficient implementation. 
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